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ABSTRACT 


In  1964  P.  Huber  established  the  following  minimax  bias  robustness  result  for  esti¬ 
mating  the  location  fi  in  the  e-contamination  family  F(i)  =  (1  —  e)$[(i  — /i)/s] -l-e//(x) 
where  $  is  the  standard  normal  distribution  and  H  is  an  arbitrary  distribution  function: 
The  median  minimizes  the  maximum  asymptotic  bias  among  all  translation  equivariant 
estimates  of  location.  However,  the  median  efficiency  of  2/tt  at  the  Gaussian  model 
may  be  unacceptably  low  in  some  applications.  This  motivates  one  to  solve  the  follow¬ 
ing  problem  for  the  above  e-contamination  family:  Among  all  location  M-estimates. 
find  the  one  which  minimizes  the  maximum  asymptotic  bias  subject  to  a  constraint 
on  efficiency  at  the  Gaussian  model.  This  problem  is  the  dual  form  analog  of  Ham¬ 
pel’s  optimality  problem  of  minimizing  the  asymptotic  variance  at  the  nominal  model 
(e.g.  the  Gaussian  model)  subject  to  a  bound  on  the  gross-error  sensitivity.  We  solve 
the  global  problem  completely  for  the  case  of  a  known  scale  par£imeter.  The  main 
conclusion  is  that  Hampel’s  heuristic  is  essentially  correct:  the  resulting  M-estimate 
is  based  on  a  psi-function  which  is  am2izingly  close,  but  not  exactly  equeil,  to  the  Hu¬ 
ber/Hampel  optimal  It  turns  out  that  one  pays  only  a  relatively  small  price  in  terms 
of  increase  in  m20cimal  bias  for  increasing  efficiency  from  64%  to  the  range  90%  —  95%. 
We  also  present  a  conjectured  solution  to  the  problem,  based  on  heuristic  aurguments 
and  numerical  calculations,  when  the  nuisance  scale  pairameter  is  unknown. 

Some  key  words:  bias-robustness,  minimax,  efficiency. 
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1.  INTRODUCTION 

Consider  the  family  of  £-contaminated  Gaussian  distribution  functions 
(1.1)  :  Fix)  =  (1  -c)$  (^1^)  +e//(x)| 

where  0  <  e  <  .5  is  fixed,  $  is  the  standard  normal  distribution,  and  H  is  an  arbitrary 
distribution.  The  main  focus  will  be  on  estimation  of  the  location  parameter  ii.  with  o; 
being  a  nuisance  scale  parameter. 

In  this  setup,  where  the  contamination  distribution  may  be  asymmetric,  all  the 
"usual”  robust  estimates  of  /i  will  be  biased  asymptotically  as  well  eis  in  finite  sample 
sizes  for  many  F  in  the  family  T.  This  problem  was  recognized  by  Huber(1964)  in  a 
brief  section  of  his  seminal  paper  on  robust  M-estimation.  Huber’s  primary  focus  was 
on  the  restricted  symmetric  form  of  Ty  where  H  is  constrained  to  be  any  symmetric 
distribution,  and  for  this  family  he  obtained  the  asymptotic  variance  minimax  .VI- 
estimate  of  However,  working  with  the  full  asymmetric  family  !F,  Huber(1964)  also 
proved  the  following  result;  Among  all  translation  equivariant  estimates  of  location, 
the  sample  median  minimizes  the  maximum  asymptotic  bias  over  the  family  (1.1)  with 
s  =  1.  His  solution  also  holds  with  $  replaced  by  certain  other  symmetric  distributions, 
and  for  the  class  of  all  translation  and  scale  equivariant  estimates  of  location  with  s 
unknown. 

The  minimax  bias  robustness  problem  can  be  stated  formally  for  a  class  T  of  loca¬ 
tion  estimates  and  the  family  T  given  by  (1.1)  bls  follows.  Assuming  as  usual  that  T 
contains  only  translation  and  scale  equivariant  estimates,  one  takes  y,  =  0  and  s  =  I 
without  loss  of  generality.  Let  r(F)  be  the  asymptotic  value  of  an  estimate  T  €  T, 
and  let  bri^,  F)  be  the  asymptotic  bias  of  T  at  F.  Since  /i  =  0,  we  have 

lyr{e,F)  =  T{F). 
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Then  the  maximum  asymptotic  bias  of  T  over  T  is 


(1.2)  Brie)  =  sup  brie.  F). 

Fer- 

\  minimax  estimate  T'  is  one  which  satisfies 

(1.3)  T"  =  argmin  Brit) 

T 

for  each  e  €  (0,  .5).  In  general,  any  estimate  which  minimizes  the  maximum  asymptotic 
bias  with  respect  to  specified  classes  of  estimates  and  mixture  distributions  will  be 
called  a  bias-robust  estimate. 

Curiously  enough,  the  global  problem  of  constructing  bicis  robust  estimates  wcis 
ignored  for  many  years  following  Huber  (1964).  Only  quite  recently  do  we  find  a 
number  of  results  along  these  lines  for  problems  such  as  minimum  distance  estimation 
(Donoho  and  Liu,  1988  a  and  b);  estimation  of  scale  (Martin  and  Zamar,  1989;  Martin 
and  Zamar,  1990),  regression  (Martin,  Yohai  and  Zamar,  1989;  Yohai,  1990),  and 
covariance  matrices  (Maronna  and  Yohai,  1989).  While  Huber  (1981)  found  that  the 
bias  robustness  problem  produced  “a  rather  uneventful  theory”  in  the  caise  of  estimating 
location,  the  results  cited  above  indicate  that  this  is  not  the  case  for  other  kinds  of 
parameter  estimation  problems. 

Of  course,  one  criticism  of  bias  robustness  is  that  this  kind  of  robustness  might  be 
.  achieved  at  the  expense  of  a  severe  loss  of  efficiency  at  the  central  model,  e.g.,  at  $  in 
(1.1).  Indeed  this  is  the  case  to  some  extent  for  the  median  as  a  bias  robust  est;  -nate  of 
location,  and  to  a  much  more  serious  extent  in  the  case  of  regression:  Martin,  Yohai, 
and  Zamar  (1989)  show  that  among  all  M-estimates  of  regression  baised  on  bounded 
rho-functions,  the  bicis  robust  estimate  minimizes  a  quantile  of  the  absolute  residuals. 
This  bias  robust  estimate  has  the  same  slow  rate  of  convergence  as  the  leeist  median 
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of  squared  residuals  (Rousseeuw,  1984;  Rousseeuw  and  Leroy,  1988),  which  turns  out 
to  be  a  quite  good  approximation  to  the  bias  robust  M-estimate  of  regression  (Martin. 
Yohai  and  Zamar,  1989). 

The  slow  rate  of  convergence  of  the  bias  robust  estimate  could  be  avoided  by  im¬ 
posing  an  efficiency  constraint  at  the  central  model,  and  this  approach  could  lead  to  a 
useful  tradeoff  between  Gaussian-case  efficiency  and  bias  control.  Because  of  the  rela¬ 
tive  simplicity  of  the  location  problem,  we  initiated  our  efforts  to  construct  efficiency- 
constrained  bias  robust  estimates  on  the  location  problem  for  the  e-contamination  model 
(1.1).  Thus  our  problem  is  to  solve: 

(1.4)  T’ =  argmin  BtCc) 

Ter 

subject  to  EFF(7’,  $)  <  e,  where  EFF{T,  $)  is  the  asymptotic  efficiency  of  T  at  the 
standard  normal  distribution  $,  with  T  in  the  class  JF  of  M-estimates  of  location,  and 
e  €  (0, 1)  a  prescribed  efficiency. 

We  remark  that  this  problem  could  equally  well  be  stated  in  the  dual  form 

(1.5)  r;  =  argminVAR(r,$) 

rer 

subject  to  BT(e)  <  6,  where  VAR(7',  $)  is  the  ewymptotic  variance  of  T  at  $  and  b  >  0 
is  a  prescribed  bound  on  the  maximum  bias.  Stated  in  this  form,  it  is  clear  that  our 
problem  of  interest  is  a  global  form  of  Hampel’s  well-known  local  optimality  problem 
(see  Hampel,  1968,  1974  and  Hampel  et.  al.,  1986):  Minimize  the  variance  at  the 
central  model,  subject  to  a  bound  on  the  gross-error  sensitivity  {GES).  The  latter 
provides,  under  regularity,  a  local  linear  approximation  to  the  maximum  bias  of  an 
estimate  for  small  c  (see  Section  2.3).  Fortuitously,  the  technique  of  proof  originally 
used  by  Hampel  for  his  local  optimality  problem  turned  out  to  be  a  key  ingredient  in 


4 


establishing  our  global  result. 


2.  MAXIMUM  ASYMPTOTIC  BIAS  WITH  NUISANCE  SCALE 
2.1  The  Maximum  Bias  Functional 
.\  location  M-estimate  r„  is  a  solution  of 

We  work  with  the  following  assumptions 

•  (Al)  0  is  continuous,  monotone,  odd,  and  bounded. 

•  (A2)  s„  is  an  estimate  of  scale  whose  almost  sure  limit  lim„_oo'Sn  defines  a  scale 
functional  S{F)  for  all  F  £  !F  with  the  boundedness  property:  0  <  S{F)  < 
S{F°°)  =  s  <  oo  for  all  F  £  T  where  F°°  =  (1  —  e)Fo  +  c^oo  <5oo  is  a 
point-mass  at  infinity. 

It  is  easy  to  check,  using  (2.2)  below,  that  under  the  contamination  model  (1.1)  the 
maximum  asymptotic  bieis  is  unbounded  for  unbounded  0.  Thus  the  boundedness  part 
of  A1  entails  no  loss  of  generality.  Formulas  for  computing  1  for  the  case  of  M-estimates 
of  scale  can  be  found  in  Martin  and  Zamar  (1990). 

Huber  (1981),  Section  3.2,  Corollary  2.2  shows  that  under  (Al)  and  (A2)  r„ 
converges  almost  surely  to  a  functional  T{F)  =  T[xl},F)  provided  this  functional  is 
uniquely  defined  by  the  asymptotic  estimating  equation 

(2.2)  j^[(y-T(F))IS(F)\dF{,y)  =  0. 

It  is  not  difficult  to  see  that  for  our  setup  T{F)  is  uniquely  defined  for  all  F  £  F.  First 
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let 


(2.3)  g(t.s)  =  g^{t,s)  =  -/  v[{y  -  t)/ s]ip{y)  dy,  3>0 
where  -fiy)  in  the  standard  normal  density,  and  note  that  for  t  >  0,  we  have 

(2.4)  git,s}  =  sf  il)iy)[^{sy -t) -^{sy  +  t)]dy. 

Jo 

For  all  F  e  :F,  the  equation  (2.2),  with  T(F)  replaced  by  t,  can  be  written  as 

(2.5)  -(1  -  e)g[l,  5(f )]  +  £  /  v[(y  -  t)l S(F)]dmy)  =  0. 

The  function  g{t,s)  is  strictly  increasing  and  continuous  in  f,  with  \imt-.rx>  g{t.s)  = 

i/>(oo)  and  limt _ oo9{t,s)  =  -t(>(oo).  Therefore,  for  all  e  <  .5,  the  left-hand-side  is 

positive  for  sufficiently  large  positive  t,  and  negative  for  sufficiently  small  negative  t. 
It  follows  that  the  solution  (in  t)  of  this  equation  defines  a  unique  functional  T(F). 

Furthermore  g{t,s)  is  strictly  decreasing  in  s  for  s  €  (0,1)  with  lim,_o5'(<.  5)  = 
[2$(t)  —  l]V’(oo).  ^From  this  and  the  above  observations  it  is  easy  to  verify  that  under 
(Al)  and  (A2)  the  maximum  asymptotic  bias  5^(c)  is  achieved  when  H  is  the  point 
mass  600  at  infinity.  Thus  5^(c)  satisfies  the  equation 

(2.6)  -  (1  -  c)5[5v-(c)7a] -|-e0(oo)  =  0. 

We  summarize  this  result  as  a  lemma. 

LEAfMA  1:  If  (Al)  and  (A2)  hold  then 

(2.7)  B^{()  =  [V’(oo)y~ 

where  gf’(-)  is  the  inverse  of  g(-,J)  and  J  =  5(F*’)  is  the  maximum  asymptotic  bias 
of  the  scale  estimate. 
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2.2  Relation  with  the  Gross-Error-Sensitivity 

The  influence  function  of  a  location  M-estimate  of  T{F)  with  psi-function  v  at  Fq 
and  X  (see  Hampel  1968,  1974)  is  given  by 

/C,(i)  =  l-rKi  -  e)f„  + 

de  gio 

where  8x  is  a  point- mass  at  x, 

gc,0{t,s)  =  {d°‘ldt°){d^/ds^)g{t,s)  ,  a,/3=l,2, ... 

and  ga0  =  5a,/3(0, 1).  The  gross-error-sensitivity  of  T  at  Fq,  also  introduced  by  Hampel 
(1968),  is  given  by 

GESM  =  sup  |/C^(x)|  = 

*  ^10 

One  expects  that  under  sufficient  regularity  conditions  the  GES  will  provide  a  local 
linear  approximation  to  B^{e)  for  £  near  zero,  that  is,  that  the  GES  will  be  equal  to 
the  derivative  R^(0)  of  the  maximal  bias  function  at  the  origin.  The  following 

lemma  shows  that  in  fact  (Al)  and  (A2)  provide  sufficient  regularity. 

LEMMA  2:  Under  (Al)  and  (A2)  R;(0)  =  GES{rl;). 

Proof:  Follows  from  (2.6)  by  differentiation. 

Figure  1  gives  the  maximum  bias  curves,  along  with  the  focal  linear  approximations 
based  on  the  GES,  for  the  following  two  location  estimators:  (1)  the  median  and  (2) 
Huber  M-estimate  with  with  c  =  1.5  using  the  shorth  as  scale  (see  comments  at  the 
end  of  Section  4),  Notice  that  the  linear  approximation  is  better  for  the  median  than 
for  the  efficient  Huber  estimate. 

2.3  The  Unconstrained  Bias  Robust  M-Estimate 

For  the  moment  assume  that  ^{oo)  =  1,  which  entails  no  loss  of  generality.  ^From 
(2.4)  and  (2.6)  and  the  monotonicity  of  g{t,l)  in  t,  it  follows  that  if  xl^\  and  satisfy 
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>  g,i.^{t,s)  for  all  t  then  Thus  B^,{e)  can  be  minimized  by 

maximizing  the  function  g^.{t,l)  for  each  t  >  0.  Noting  that  [^^(sx  —  t)  —  ^{Ix  +  t)]  >  0 
for  all  X  >  0  and  t  >  0,  one  sees  that  the  “sign"  function  vix)  =  sgn(x)  maximizes 
gy,.{t,l)  with  respect  to  i/.’.  Thus  the  sample  median  is  the  bias-robust  M-estimate  of 
location  with  minimax  bias 


B\fED{()  =  ^  *  [0-5/(  1  -  c)] . 

This  is  a  particular  case,  for  M-estimates,  of  the  more  general  result  obtained  by  Huber 
(1964)  for  the  class  of  all  translation  equivariant  estimates  of  location. 

3.  EFFICIENCY  CONSTRAINED  SOLUTION  WITH  SCALE  KNOWN 

In  this  section  we  find  the  efficiency  constrained  bias-robust  M-estimate  of  location 
for  the  case  where  scale  is  known,  taking  s  =  1  without  loss  of  generality. 

3.1  Candidate  Solutions  via  Calculus  of  Variation 

First,  we  use  calculus  of  variation  to  give  a  heuristic  derivation  of  the  optimal 
psi-function  v*.  In  the  next  subsection  we  give  a  direct  proof  based  on  projection 
methods. 

By  definition,  an  efficiency  constrained  bias-robust  M-estimate  of  location  solves 
the  following  constrained  minimization  problem: 


subject  to 


(3.1) 


inf 


(/o'”  ^'(x)<^(x)<ixP 

where  is  the  M-estimate  asymptotic  variance  and  e  is  the  desired  efficiency. 
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In  view  of  (2.4),  (2.6)  and  the  strict  monotonicity  of  g{t,s),  it  suffices  to  find,  for 
each  <  >  0,  a  score  function  ii't  which  maximizes  the  functional 

(3.2)  Jt{v)  =  /  /  v{x)[^{x  -  t)  -  +  t)]dx 

i;’(oc)  Jo 

subject  to  the  given  side  constraint.  Then,  assuming  that  is  continuous  and 

monotone  increasing  in  t,  the  solution  to  the  constraint  optimization  problem  will  be 
ii>*  =  where  Jtoi'tl’io)  =  Since  the  constraint  in  (3.1)  is  not  an  integral  con¬ 

straint  and  since  the  objective  function  Jt(0)  is  not  an  integral  on  a  finite  interval,  it  is 
convenient  to  consider  instead  the  following  family  of  standard  optimization  problems: 
Maximize  7^(0)  subject  to  thvi  constraints 
(Bl)  v(0)  =  0,  V’(c)  =  M. 

(B2)  /o"02(i)<^(x)di  +  2[l  -<5(c)]A/2  <  e-‘. 

(B3)  2^/o"0'(i)<^(x)dx=l. 

where  c  and  M  are  constants  such  that  (B1)-(B3)  can  be  simultaneously  satisfied  for 
at  least  one  0. 

The  functional  J{(0)  is  linear,  and  hence  convex,  and  the  set  of  0- functions  satis¬ 
fying  (Bl)-(B3)  is  convex.  Thus  we  have  a  convex  optimization  problem  for  each  fixed 
(c,  M).  At  first  sight,  a  natural  approach  is  to  solve  the  convex  optimization  prob¬ 
lem  for  each  allowable  pair  (c,M)  and  then  optimize  over  all  allowable  (c.  A/).  The 
Lagrangian  for  the  problem  with  (c,  M)  fixed  is 

G(x,0,0*)  =  — 0(x)(v?(x  —  t)—  -I-  t)]  +  Ai0^(x)(,?(x)  +  A20'(x)(,?(x) 

and,  by  convexity,  a  sufficient  condition  for  optimality  is  that  the  Euler-Lagrange 
equation  be  satisfied: 

=  g}{x  -t)-  v>(x  +  0  +  2Ai0(x)v?(x)  +  A2V5'(x)  =  0. 

OX 
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Thus  for  fixed  (c,  A/),  the  optimal  psi-function  is  of  the  form 


=  < 


0-1 J  +  Q2- 


r(x  -  0  -  y{x  +  t] 


|x|  <  c 


A/  |x|  >  c 

Notice  that  Oj  =  1  and  aj  =  0  gives  Huber’s  ^-function  and  Oj  =  0  and  02  =  1  gives 
a  r-function  which  is  proportional  to  a  truncated  hyperbolic  sine  function. 

Unfortunately,  ais  c  and  A/  vary  we  no  longer  have  a  convex  optimization  problem 
and  we  were  unable  to  make  this  variational  argument  rigorous.  In  the  remainder  of 
this  section  we  give  a  direct  proof  that  a  solution  to  the  optimization  problem  actually 
exists  and  is  of  this  form. 

3.2  The  Dual  Problem 

Let  B  be  an  achievable  maximum  bias,  that  is  5  =  B^{e)  in  (2.7)  for  some  xi) 
satisfying  (Al)  and  (A2).  Notice  that  if  Bmed  =  ‘s  the  maximum  bias  of  the 

median  then  B  >  Bmed-  Let  be  the  set  of  psi-functions  satisfying  (Al),  (A2)  and 
(Cl)  2  ip(x)x'.p(x)dx  =  1 

(C2)  (I  -  €)g^(B,l)  =  eipioo). 

Since  the  asymptotic  variance  V AR{rp,(i>)  and  bias  B^{i)  are  invariant  under  multipli¬ 
cation  of  0  by  a  constant,  the  condition  Cl  above  is  just  a  convenient  standarization. 
Observe  that  given  Cl,  0(oo)  is  the  gross-error-sensitivity  of  the  corresponding  M- 
estimate,  so  we  will  write  GES{xl))  =  V’(oo)-  Also  notice  that  Cl  implies 


(.3.3)  GES{xi^)  >  (2;r)‘/2  =  GES{Median). 


The  dual  optimization  problem  can  now  be  stated: 

Fix  B  >  Bmed  and  find  V'*  €  'te  which  minimizes 


(3.4) 


J(0)  =  /  ^^(1)9(1)  di. 

Jo 
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VVe  need  the  following  lemma,  which  states  that  for  any  fixed  B  >  B^ed  and  any 
nice  unbounded,  odd  function  ^(j).  there  exists  a  rescaled  truncated  version  of  9(x) 
which  is  in  'I’e-  We  will  denote  by  9c(x)  the  truncation  of  0{x)  at  c,  that  is. 


Oc(x) 


I 


9{x), 

9{c)sgn{c) 


|x|  <  c 
|x|  >  c. 


LEMMA  3:  Let  9{x)  be  differentiable,  odd  and  monotone,  with  limj._^  0(x)  =  oo  and 
E*|d(,Y)|  <  oo.  For  each  B  >  Bmed  there  exist  cq  and  such  that  ko9cc{x)  €  'Pa- 
Proof:  For  fixed  c  >  0,  the  function 

7(^  c)  =  ^  ^  0cix)[<f{x  -t)-  ^{x  +  t)]dx  =  ——  j  ^  dc{x  -  t)^{x)dx 

is  continuous  and  monotone  increasing  in  t.  By  the  dominated  convergence  theorem 
lim(^o7(fiC)  =  0  and  limj_oo7(^c)  =  1  Thus,  given  0  <  c  <  oo  there  exists 
B[c)  such  that  7[B(c),c]  =  c/(l  —  c).  Since  B{c)  is  continuous  and  non-decreasing  with 
lime— oo  B(c)  =  oo  and  lime— o  5(c)  =  Bmed  there  exists  c*  such  that  B[c')  =  B.  The 
lemma  follows  now  with  co  =  c*  and  fco  =  So^  • 

3.3  Solution  of  the  Dual  Problem 

The  dual  problem  we  wish  to  solve  is  similar  to  Hampel’s  optimality  problem  of 
minimizing  the  variance  at  the  central  model  subject  to  a  bound  on  the  gross  error 
sensitivity.  See  Hampel  (1968)  and  Hampel  et.  al.  (1986).  The  difference  of  course 
is  that  we  are  replacing  the  bound  on  infinitesimal  bias  with  a  bound  on  the  actual 
maximum  bias.  Nonetheless,  Hampel’s  technique  for  obtaining  a  key  inequality  still 
provides  an  essential  step  in  the  solution  given  here. 
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For  each  b  >  GES{Median)  =  6o  let  'if B.k  be  the  subset  of  all  the  c  €  b  '^'th 
gross-error-sensitivity  equal  to  b.  Then,  clearly  'ifs  =  Also,  let  ti’i(x)  and 

Vy'2{x)  be  the  truncated  and  scaled  t'-functions  given  by  Lemma  3  corresponding  to  the 
identity  function  /(i)  =  i  and  to  the  function 

^  -  e-S'), 

-Pi^) 

which  is  proportional  to  the  hyperbolic  sine.  The  gross-error  sensitivities  are 

GESi  =  GESixj^i)  =  xbi{ci)  ,  i  =  1,2 

where  Ci  and  Cj  are  the  corresponding  trunction  constants. 

The  following  theorem  shows  that,  in  terms  of  J(ti’),  dominates  all  0  in  'iB,b 
with  bo  <  b  <  GES\  and  V’j  dominates  all  ^  in  with  b  >  GES^- 
THEOREM  1:  Suppose  that  V’  €  satisfies  GES{rp)  <  GESi  or  GES{rp)  >  GES^. 
Then  J{ip)  >  min{ 

Proof:  Assume  first  that  bo  <  b—  GES{\i;)  <  GESi.  Since  GES{Ic)  =  .5c[<>Cc)  -  .5]“* 
is  continuous  in  c,  tends  to  bo  =  as  c  -+  0  and  tends  to  oo  ais  c  — »  oc,  there 

exists  0  <  c  <  Cl  such  that  GES{Ic)  =  b.  The  inequality  y(V’)  >  VAR{Ic,^)  now 
directly  follows  from  Hampel’s  result  of  optimality  of  Ic  among  all  functions  which 
satisfy  A1-A2  and  have  GES  bounded  above  by  b.  (see  for  example  Hampel  et.  al. 
1986,  Theorem  1,  Section  2.4,  p.ll7;  see  also  Theorem  5,  Section  2.5d,  p.  135).  Then, 
since  V AR{Ie,  is  a  decreasing  function  of  c,  one  follows  that 

7(0)  >  VA/2(/„$)  >  VAR{h,,<ff)  =  J(0i). 


We  now  turn  to  the  optimality  of  02  over  Assume  that  b  =  GES(v)  > 

GES2  =  i>2  and  let 


0(1) 


0(1) 

b 


02(l) 


&2 
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Tf  \  -  1 _ ^ 

A{c,)  ’  •  Mc2)(l-ey 

Observe  that  b  =  t^’(oo)  and  so,  using  C2,  one  obtains  y.^{x)A{x)^{x)dx  =  i.  A 
similar  argument  shows  that  /j”  vl>2(x)A(i)(,j(x)<fx  =  jS.  Using  the  last  two  equations 
and  the  fact  that  A(x)  =  for  0  <  x  <  cj,  A(x)  >  1  for  x  >  C2  and  il'{x)  <  1  for 
X  >  0  one  follows  that 

f  0^(x)9(x)dx  +  d  =  /  (^(x)  —  A(x)P(,5(i)di 

Jo  Jo 

>  f  [t/>2(l)  —  ^(x)]^(^(3;)‘^2; 

Jo 

—2 

=  /  Ip2{x)9{x)dx  +  d, 

^0 

where  d  =  —2^+ A^(x)(^(x)dx  is  a  constant  which  does  not  depend  on  rl’.  Therefore, 

JW~I  t/’^(3:)9(x)dx  >  (6/62)^  /  i>l{x)ip{x)dx  >  J{ip2), 

Jo  Jo 

completing  the  proof. 


The  following  theorem  is  our  main  result. 

THEOREM  2‘.  For  each  0  €  ’I’b  there  exist  c  €  [ci,  C2],  ai  >  0,  and  02  >  0  such  that 


<t>c{x)  =  < 


ai/(x)  +  a2A(x) 

[  [Qti/(c)  +  a2A{c))sifn(x) 
belongs  to  and  J(V>)  >  J{<i>c)- 
Proof:  Let  V’  €  'Pb  be  such  that 


lx|  <  c 
lx|  >  c 


GES(0:)  <  GES(0)  <  GES(t/>2). 

Note  that  by  Theorem  1,  we  only  need  to  consider  this  case  because  if  GES(»/>2)  ^ 
GES(V’i)  then  the  theorem  trivially  holds.  To  fix  ideas,  suppose  that  c\  >  c^.  The 
cases  Cl  <  C2  and  Ci  =  C2  can  be  handled  in  a  similar  way. 
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The  function  rl’dx)  =  ai(c)/c(x)  +  a2(c)Ac(x)  is  in  provided  that  oi(c)  and  a^i  c) 
are  non-negative  and  satisfy  the  equations 


+  02(^)"4i2  =  0 

Qil(c)'^21  +  0!2(c)A22  =  1 

where 

=  /  /c(x)A(x)(^(i)dx  - /3c, 

Jq 

Ai2  =  /  A<.(x)A(i)9(i)dx-/?A(c), 

Jq 

^21  =  2[$(c)-0.5], 

A22  =  2  r  A'dxMx)dx  =  2Bmc  + B)  +  4f{c- 

Jo 

with  3  =  e/ (I  —  e).  Clearly,  .421  >  0  and  .422  >  0.  Also  A12  >  0  because 

Ici^{x)A{x)ip(x)  dx  -  3c\  =  0. 

and  c  <  Cl  implies  that  Ic{x)fc  >  Ic^{x)lci  for  all  x  >  0  with  strict  inequality  for  0  < 
X  <  c.  Analogously,  using  the  fact  that  c  >  C2  implies  Ac(x)/c  <  Ac(x)/c2  ,  for  all  x  > 
0  with  strict  inequality  for  0  <  x  <  c,  together  with  /J"  Ac2(x)A(x)vj(i)  dx  — /3A(c2)  = 
0,  one  concludes  that  A12  <  0. 

Therefore,  for  all  C2  <  c  <  ci  we  have  oti(c)  =  —AxillAnA^z  —  A12A21]  >  0  and 
ot2{c)  =  — ai(c)Aii/Ai2  >  0.  Moreover,  since  ^c^ix)  =  k2Ac^{x) 

and  since  GES(t/>c)  is  a  continuous  function  of  c,  there  exist  ci  <  c*  <  C2  such  that 
GESirP)  =  GESiM.  Let 

6{x)  =  Qi{c*)/{x)  -I-  Q2(c*)A(x). 
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Notice  that  ^(i)  is  strictly  increasing  and  that  0(t)  >  for  all  x  >  c’  [because 
9{c-)  =  a,(c*)/(c*)  +  a2(c*)A(c*)  =  GES(0,.)  =  GES(^)  =  t/>(oo)].  Thus 

f  [tpix) -6{x)]^tp{x)dx  >  [  [<i>c.{x) -e{x)Y^{x)dx. 

Jo  Jo 

The  theorem  follows  now  because 

/  [0(x)  -  d(x)]V(a;)  dx  =  J(0)  +  d 
Jo 

and 

/  [<i>c^{x)  -  0ix)]‘^(p{x)dx  =  +  d, 

Jo 

with  d  =  /o~  e^ixMx)  dx  +  2[qi(c*)  -  a2(c*)/3GES(V>)]. 

3.4  Numerical  Results 

The  numerical  calculation  of  the  optimal  0*  is  done  2is  follows.  For  a  given  value 
of  c  (c  =  0.05,  say)  the  constants  ci  and  C2  are  determined  by  solving  the  non-linear 
equations 

(1  -  c)  /  xA(x)<f{x)dx.+  (1  -  €)ci($(ci  +  B)  -  $(ci  -  .6)]  -  cjc  =  0 
Jo 

and 

(1  -  c)  A^(i)<^(x)  dx  -t-  A(c2)  [$(c2  -b  5)  -  $(c2  -  5)] I  -  A(c2)e  =  0. 

Our  numerical  results  show  that  in  general  Ci  >  C2. 

Next,  using  the  fact  that  for  all  0,  ^(x)  and  kr/j{x)  determine  the  same  location 
estimate,  it  follows  that  the  optimal  is  a  truncation  at  c  of  the  function  qx  -b  ( 1  — 
a)A(x),  for  some  0  <  a  <  1  and  C2  <  c  <  ci. 

This  psi-function  is  denoted  by  i^dx)  and  for  each  c,  the  value  of  a  is  determined 
using  C2,  i.e.,  solving  the  linear  equation 

(1-e)/  [ax -b  (1  -  a)A(x)]  A(x)vj(x)<fx-b 
Jo 
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[qc  +  (1  -  q')A(c)]  {[^»(c  -  B)  -  ^(c  +  B)]{1  -  e)  -  e}  =  0. 

Finally,  the  asymptotic  variance  of  tt’c,  V can  be  computed.  This  is  done 
on  a  fine  grid  of  values  of  c  in  [c2,ci]  c<^l, . . . ,  c^"''}  and  c*  is  approximated  by 

the  grid  point  that  minimizes  V AR{4’c,^)-  In  general,  as  shown  by  Figure  2.  the 
optimal  xp'{x)  is  very  well  approximated  by  the  Huber’s  function  li’ci(j)  having  the 
same  asymptotic  bias  B,  even  for  values  of  e  near  the  breakdown  point  one  half. 

Figure  3  shows  the  maximum  bias  curve  of  the  optimal  ip*  for  several  efficiencies. 
Notice  that  a  significant  increase  in  efficiency  can  be  obtained  in  exchange  for  a  fairly 
small  increase  in  bias. 

Finally  we  present  some  numerical  results  which  allow  a  direct  comparison  of  our 
exact  approach  with  that  based  on  the  GES  lineM  approximation.  Suppose  that, 
for  a  given  value  of  c,  we  want  to  choose  a  robust  location  estimate  according  to  the 
following  criterion:  among  all  the  location  estimates  T  which  have  a  bias-deficiency  of 
up  to  10%,  i.e.  all  T  for  which 

Brit) 

Bmed{^) 

choose  the  T  that  minimizes  the  asymptotic  variance  under  the  Gaussian  model.  Note 
from  Figure  2  that  a  bias  deficiency  of  10%  corresponds  to  an  efficiency  constraint  in 
the  range  85%-90%. 

According  to  the  GES  approach  one  would  first  approximate  5t(c)  by  c  •  GES{T). 
By  Hampel’s  optimality  result  we  can  restrict  attention  to  M-estimates  with  Huber’s 
0-functions  xpc-  Then  we  just  need  to  choose  the  tuning  constant  c  to  achieve  the  10% 
bias  deficiency.  Since  GES{ipc)  =  c/(2$(c)  —  1)  the  constant  c  is  determined  by  the 
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non-linear  equation 


ct 


=  1.1$-* 


1 


2(1 -6)J 


2$(c)  —  1 

On  the  other  hand,  following  the  exact  global  approach  one  would  set  the  maximum 
bi«is  B  equal  to  l.l$-*[.5/(l  —  e)]  and  choose  the  optimal  ^-function  given  by  Theorem 
2  for  such  value  of  B.  In  view  of  Figure  2  a  good  approximation  for  the  optimal  t’  can 
be  obtained  by  restricting  attention  to  ^-functions  of  the  Huber  type.  In  this  case  the 
tuning  constant  c  is  determined  by  solving  the  equation 


=  1.1$-* 


1 

2(1 -e) 


Table  1  gives  the  values  of  the  constant  c  obtained  by  the  GES  and  exact  approaches 
as  well  as  the  corresponding  bias  deficiencies.  Notice  that  the  values  of  c  given  by  the 
exact  approach  does  not  change  much  with  c  (differences  only  occurred  in  the  third 
decimal  case)  and  that  the  resulting  estimate  is  fairly  efficient  (eff  =  0.85  ).  On  the 
other  hand,  the  values  of  c  given  by  the  GES  approximation  varies  considerably  and 
tends  to  be  disturbingly  unconservative,  particularly  for  moderate  to  large  values  of 
c.  For  example,  the  actual  biais  deficiency  of  the  estimates  chosen  according  to  the 
GES  approach  are  17%  for  e  =  0.05,  22%  for  c  =  0.10,  30%  for  e  =  0.15  and  63%  for 
c  =  0.30,  instead  of  the  nominal  10%. 


4.  EFFICIENCY  CONSTRAINED  SOLUTION  WITH 
SCALE  UNKNOWN 

When  the  scale  is  unknown  the  side  constraint  C2  must  be  replaced  by  C3 


(1  -  e)g{B,s)  =  cr/>(oo) 
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and  consequently  the  function  A(x)  must  be  replaced  by 


A(x)  = 


^(sx  —  B)  —  ^(sx  +  B) 

TU)  ■ 


Unfortunately.  A(x)  is  no  longer  monotone  (notice  that  lim^.^^  A(x)  =  0)  and  so 

Lemma  3  cannot  be  applied  to  ensure  the  existence  of  a  scaled  and  truncated  version 

of  A(x)  which  satisfies  Cl  and  C3.  However,  numerical  calculations  indicate  that  the 

result  of  Lemma  3  still  holds  in  this  case,  that  is,  that  there  exist  constants  k.  cj  and 

C3  (which  depend  on  e)  such  that  the  function 

( 

fcA(i),  |x|  <  C3  or  lx|  >  C3 

-fcA(C3)  C3  <  |x|  <  <^3 

satisfies  Cl  and  C3.  It  can  also  be  proved,  using  a  similar  argument  as  in  the  proof 
of  Theorem  1,  that  the  function  ^3  has  the  same  property  as  the  function  ^2  in  that 
theorem: 


Mx)  =  s 


GES{tp)  >  GES{tl^z]  implies  V AR{il;,^)  >  V AR{tp3,^) 


for  all  €  '!'•  Thus,  as  in  the  known-scale  case,  attention  can  be  restricted  to  psi- 
functions  <f>c{x)  which  are  truncated  and  rescaled  versions  of 

<f){x)  =  Qi  -f  (1  —  a)A(i)  ,  0  <  a  <  1 


with  truncation  constant  c  between  ci  and  C3.  Here  C\  is  the  value  of  the  tuning 
constant  of  the  Huber’s  psi-function  satisfying  Cl  and  C3,  and  its  existence  follows 
from  Lemma  3. 

If  0  <  Q  <  1  is  such  that  <^(z)  >  <i){c)  for  all  i  >  c  then 


0c(l) 


<^(c) 


lx|  <  c 
|x|  >  c 
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On  the  other  hand  if  a  is  such  that  <r»(x)  is  not  monotone,  cis  in  Figure  4,  then 


I  Of-r), 


[  <P{c) 


1j|  <  c  or  c'  <  |x|  <  c" 
c  <  lx|  <  c'  or  |xl  >  c 


Unfortunately,  the  lack  of  monotonicity  of  A(i)  makes  the  optimality  problem  much 
more  involved  and  one  must  resort  to  a  combination  of  analytical  derivations  and 
numerical  calculations  to  obtain  the  optimal  0*.  The  main  conclusion  from  our  cal¬ 
culations  is  that,  as  in  the  known  scale  case,  Huber’s  psi-function  rl'i  with  the  tuning 
constant  c*  determined  by  the  condition  C3  is  an  excellent  approximation  to  the  opti¬ 
mal  0*. 

Evidently,  c*  =  c’(e,l),  depends  on  the  fraction  of  contamination  e  and  the  maxi¬ 
mum  value  J  of  the  asymptotic  scale  functional.  In  fact,  it  can  be  easily  verified  using 
the  identity  ij^dxls)  =  {l/s)ipcsix)  valid  for  Huber’s  psi-functions  for  all  s  >  0,  that 
c’  =  c*(c,I)  =  c(£)/5,  where  c(c)  is  the  value  of  the  tuning  constant  for  the  nearly 
optimal  Huber’s  psi-function  in  the  scale-known  cetse.  Thus,  the  tuning  constant  c* 
for  the  nearly  optimal  Huber’s  psi-function  is  larger  when  the  maximum  asymptotic 
functional  s  of  the  scale  estimate  s„  is  smaller.  Since  for  Huber’s  psi-functions  Vc  the 
asymptotic  variance  V is  a  decreasing  function  of  c,  it  becomes  evident  that 
the  degree  of  unconstrained  bias-robustness  of  the  scale  estimate  S'  will  have  an  im¬ 
pact  on  the  optimal  bias-robust  location  estimate  subject  to  an  efficiency  constraint. 
Therefore,  according  to  the  results  in  Martin  and  Zamar  (1990),  an  appropriate  choice 
of  the  scale  estimate  Sn  is  given  by  the  shorth  (see  Andrews  et.  al.  1972)  which  is 
nearly  optimal  bias-robust  among  M-estimates  of  scale  with  breakdown-point  equal  to 
one  half. 
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5  .  CONCLUDING  REMARKS 

It  has  been  correctly  pointed  out  by  an  anonymous  referee  that  the  results  of  this 
paper  amount  to  a  rigorous  analysis  of  the  correctness  of  Hampel’s  heuristic  approach 
in  the  location  setup.  Still,  one  might  conjecture  by  analogy  that,  in  the  case  of 
more  complicated  models,  maximizing  the  efficiency  under  a  constraint  on  the  gross- 
error-sensitivity  is  almost  the  same  as  putting  a  constraint  on  the  bias.  However,  the 
extension  of  our  technique  to  more  complicated  models  is  by  no  means  straightforward 
(if  al  all  possible)  and  this  matter  deserves  further  study. 

.Another  interesting  issue  brought  up  by  an  anonymous  referee  is  that  if  one  leaves 
the  realm  of  M-estimates  of  location,  there  is  not  necessarily  a  pay-off  between  bias- 
robustness  and  efficiency  at  the  central  Gaussian  model.  This  is  shown  by  the  following 
example  (also  provided  by  the  referee):  Let  h  :  R  Rhe  symmetric,  differentiable  and 
non-increasing  on  [0,oo)  with  h{x)  =  1  for  0  <  x  <  1  and  h{x)  =  0  for  x  >  2.  Denote 
the  mean  and  the  median  of  a  distribution  F  by  T\t{F)  and  Tmed{F)  respectively. 
For  any  ^  >  0  let 

r«(f)  =  r„ED(f)  +  (r„(F)-r„ED(F)iA  ^ 

where  O.oo  =  0  and  Tjvf(F)  =  oo  if  F  doesn’t  have  a  finite  mean.  If  the  scale  also  h«is 
to  be  estimated,  6  can  be  replaced  by  6MAD{F). 

Since 

\TsiF)-TMEDiF)\<C8  ,  C  =  snpxh{x)<2 

r>0 

the  maximum  biais  of  Ts{F)  exceeds  that  of  the  median  by  at  most  26.  If  F  is  such 
that  Tm{F)  —  Tmed{F)  =  0  (e.g.  F  is  symmetric,  with  finite  mean)  then 

r«(F„)  =  rM(F„)  ,  a.s.  [Fl, 
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for  n  sufficiently  large.  Therefore,  TsiFn)  hais  an  asymptotic  efficiency  of  1  when 
F  =  iV(0, 1)  but  an  efficiency  of  0  if  F  has  infinite  variance. 

The  following  argument  shows  that  there  are  some  merits  in  considering  the  opti¬ 
mality  problem  on  the  class  of  M-estimates  of  location.  It  is  well  known  that  bias-robust 
M-estimates  of  location,  that  is,  M-estimates  with  a  bounded  rp,  are  variance-robust: 
the  supremum  of  their  asymptotic  variance  over  symmetric  e-contamination  neighbor¬ 
hoods  of  the  Gaussian  model  is  finite.  This  suggests  that  the  efficient  bisis-robust 
M-estimate  is  not  only  relatively  efficient  at  the  central  Gaussian  model  but  also  over 
a  symmetric  neighborhood  of  it.  In  fact,  when  the  scale  parameter  is  known  one  can 
easily  verify  that 


=  sup  AV{^,,F)  = 


Using  this  formula  one  can  check  that  the  supremum  variance  of  the  Huber’s  M-estimate 
of  location  with  c  =  1.345  (for  instance)  is  fairly  small:  AV  =  1.053  when  e  =  0. 
AV  =  1.257  when  c  =  .05  and  AV  =  1.795  when  e  =  .15. 
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Figure  2:  OPTIMAL  l-FI’N'CTIONS  AND  CORRESPONDIN'G  NEARL'i  u'-M'- 


MAL  HUBER  APPROXIMATES 


0.0  0.2  0.4  0.6  0.8  1.0  1.2  1.4  1.6 


(a)  EPSILON  .  .1,  Biaa  .  .175,  ALPHA  -  .63 


0.0  0.2  0.4  0.6  OJ  1.0  1.2  1.4  1.6 

(e)  EPSILON  >  .4.  BiM  .  1.21.  ALPHA  -  .86 


0.0  02  0.4  0.6  0.8  1.0  12  1.4  1.6 

(b)  EPSILON  .  2.  Biaa  .  .71 .  ALPHA  .  .86 


0.0  02  0.4  0.8  0.8  16  12  14  1.6 

(d)  EPSLON  .  .40.  Biaa  .  228.  ALPHA  .  .76 


Figure  3:  MAXIMUM  UIAS  CURVES  OF  SEVERAL  UIAS-RODUST  ESTIMATES 
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1  lie  lowest  curve  coi responds  to  the  median  and  the  highest  corresponds  to  the  'J5%  ellit  ient  liias  lohnst 
estimate.  'I'he  intermetliate  ones  conespon.l  to  the  75%,  80%  and  90%  ellii  irnt  hias  rohnsl  e.slimali  .s. 


Figure  4:  REDESCENDING  ^-FUNCTION 
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